Gambler's Ruin Bandit Problem

机译：赌徒的废墟强盗问题

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, we propose a new multi-armed bandit problem called theGambler's Ruin Bandit Problem (GRBP). In the GRBP, the learner proceeds in asequence of rounds, where each round is a Markov Decision Process (MDP) withtwo actions (arms): a continuation action that moves the learner randomly overthe state space around the current state; and a terminal action that moves thelearner directly into one of the two terminal states (goal and dead-end state).The current round ends when a terminal state is reached, and the learner incursa positive reward only when the goal state is reached. The objective of thelearner is to maximize its long-term reward (expected number of times the goalstate is reached), without having any prior knowledge on the state transitionprobabilities. We first prove a result on the form of the optimal policy forthe GRBP. Then, we define the regret of the learner with respect to anomnipotent oracle, which acts optimally in each round, and prove that itincreases logarithmically over rounds. We also identify a condition under whichthe learner's regret is bounded. A potential application of the GRBP is optimalmedical treatment assignment, in which the continuation action corresponds to aconservative treatment and the terminal action corresponds to a risky treatmentsuch as surgery.

机译：在本文中，我们提出了一个新的多武装强盗问题，称为The Gambler's Ruin Bandit Problem（GRBP）。在GRBP中，学习者按轮次顺序进行，其中每轮是具有两个动作（手臂）的马尔可夫决策过程（MDP）：一种连续动作，使学习者在当前状态周围的状态空间上随机移动；本轮回合在达到终极状态时结束，学习者仅在达到目标状态时才获得正向奖励。学习者的目标是最大化其长期奖励（达到目标状态的预期次数），而无需事先了解状态转换概率。我们首先以GRBP最优策略的形式证明结果。然后，我们定义了学习者对于全能神谕的遗憾，后者在每个回合中表现最佳，并证明其在回合中呈对数增长。我们还确定了限制学习者后悔的条件。 GRBP的潜在应用是最佳治疗方案，其中持续作用对应于保守治疗，而终止作用对应于危险的治疗，例如手术。

著录项

作者
Akbarzadeh, Nima; Tekin, Cem;
展开▼
作者单位

展开▼
年度 2016
总页数
原文格式 PDF
正文语种
中图分类

相似文献

外文文献
中文文献
专利

1. Gambling in a rigged casino: The adversarial multi-armed bandit problem [J] . Peter Auer, Nicolo Cesa-Bianchi, Yoav Freund, Electronic Colloquium on Computational Complexity . 2000,第131期

机译：在操纵的赌场中赌博：对抗性多武装匪徒问题
2. Deciding when to quit the gambler's ruin game with unknown probabilities [J] . Filipo Studzinski Perotto, Imen Trabelsi, Stéphanie Combettes, International Journal of Approximate Reasoning . 2021,第Octa期

机译：决定何时退出赌徒的遗址与未知概率
3. Gambler’s Risk of Ruin and Optimal Bet [J] . Robert Kay Ankomah, Richard Oduro, Emmanuel Kojo Amoah Communications in Mathematical Finance . 2020,第1期

机译：赌徒的破坏风险和最佳赌注
4. Gambler's Ruin Bandit Problem [C] . Nima Akbarzadeh, Cem Tekin Annual Allerton Conference on Communication, Control, and Computing . 2016

机译：赌徒的废墟土匪问题
5. Change -point detection of two -sided alternatives in the Brownian motion model and its connection to the gambler's ruin problem with relative wealth perception [D] . Hadjiliadis, Olympia. 2005

机译：布朗运动模型中双向选择的变化点检测及其与相对财富感知的赌徒破产问题的联系
6. A modified gamblers ruin model of polyethylene chains in the amorphous region. [O] . Z H Duan, L N Howard 1996

机译：非晶区中聚乙烯链的改良赌徒废墟模型。
7. Gambling in a rigged casino: The adversarial multi-armed bandit problem [O] . Peter Auer, Nicolo Cesa-Bianchi, Yoav Freund, 1995

机译：在操纵的赌场中赌博：对抗性多武装匪徒问题

Gambler's Ruin Bandit Problem

摘要

著录项

相似文献

相关主题

期刊订阅